26. Quiz - Model Tuning
Let's continue exploring the parameter space for our question classification model.
As a first step break your data set into 90% of training data and set aside 10% to answer what's the accuracy of the best model you trained using unseen data.
On the first 90% of the data let's find the most accurate logistic regression model using 3-fold cross-validation with the following parameter grid:
- CountVectorizer vocabulary size: [1000, 5000]
- LogisticRegression regularization parameter: [0.0, 0.1]
- LogisticRegression max Iteration number: [10]
Set the random seeds of all stages of the pipeline to 42.